## [1] "Loading the following libraries using lb_myRequiredPackages: data.table"
## [2] "Loading the following libraries using lb_myRequiredPackages: lubridate" 
## [3] "Loading the following libraries using lb_myRequiredPackages: ggplot2"   
## [4] "Loading the following libraries using lb_myRequiredPackages: readr"     
## [5] "Loading the following libraries using lb_myRequiredPackages: plotly"    
## [6] "Loading the following libraries using lb_myRequiredPackages: knitr"

1 Purpose

To extract and visualise tweets and re-tweets of #dockercon for 17 - 21 April, 2017 (DockerCon17).

Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/

2 Load Data

Data should have been already downloaded using collectData.R. This produces a data table with the following variables (after some processing):

##  [1] "text"             "favorited"        "favoriteCount"   
##  [4] "replyToSN"        "created"          "truncated"       
##  [7] "replyToSID"       "id"               "replyToUID"      
## [10] "statusSource"     "screenName"       "retweetCount"    
## [13] "isRetweet"        "retweeted"        "longitude"       
## [16] "latitude"         "location"         "language"        
## [19] "profileImageURL"  "createdLocal"     "obsDateTimeMins" 
## [22] "obsDateTimeHours" "obsDateTime5m"    "obsDateTime10m"  
## [25] "obsDateTime15m"   "obsDate"          "isRetweetLab"

The table has 7,485 tweets (and 9,890 re-tweets) from 5,690 tweeters between 2017-04-16 19:01:03 and 2017-04-19 22:57:43 (Central District Time).

3 Analysis

3.1 Tweets and Tweeters over time

All (re)tweets containing #dockercon 2017-04-17 to 2017-04-20

All (re)tweets containing #dockercon 2017-04-17 to 2017-04-20

3.1.1 Day 1 - Monday (Workshops)

All (re)tweets containing #dockercon Monday 17th April 2017

All (re)tweets containing #dockercon Monday 17th April 2017

3.1.2 Day 2 - Tuesday (Main Day 1)

All (re)tweets containing #dockercon Tuesday 18th April 2017

3.1.3 Day 3 - Wednesday (Main Day 2)

All (re)tweets containing #dockercon Wednesday 19th April 2017

All (re)tweets containing #dockercon Wednesday 19th April 2017

3.1.4 Day 4 - Thursday (Main Day 3)

3.2 Location (lat/long)

We wanted to make a nice map but sadly we see that most tweets have no lat/long set.

All logged lat/long values
latitude longitude nTweets
NA NA 17325
30.26416397 -97.73961067 2
30.26857 -97.73617 1
30.2625 -97.7401 28
30.26470908 -97.7417368 1
30.20226566 -97.66722505 1
42.36488267 -71.02168356 1
37.61697678 -122.38427689 1
30.2672 -97.7639 3
30.2635554 -97.7399303 1
30.2591 -97.7384 1
30.26622515 -97.74327721 1
30.26037 -97.73848 2
30.258201 -97.71264 1
30.25888 -97.73841 2
30.259714 -97.73940054 1
30.26006 -97.73813 1
30.26006 -97.73859 1
30.26036009 -97.73848483 1

3.3 Location (textual)

This appears to be pulled from the user’s profile although it may also be a ‘guestimate’ of current location.

Top locations for tweets:

Top 15 locations for tweeting
location nTweets
NA 2603
San Francisco, CA 1282
San Francisco 501
Austin, TX 329
Seattle, WA 227
Silicon Valley, CA 198
Paris 171
Islamabad, Pakistan 146
London 126
New York, NY 123
Charlotte, NC 120
San Jose, CA 114
west tokyo 104
USA 104
Boulder, CO 103

Top locations for tweeters:

Top 15 locations for tweeters
location nTweeters
NA 1036
San Francisco, CA 172
Austin, TX 85
San Francisco 61
Seattle, WA 48
New York, NY 40
San Jose, CA 40
Paris 38
London, England 34
Palo Alto, CA 29
Paris, France 29
New York 27
Boston, MA 26
London 25
Washington, DC 24

3.4 Screen name

Next we’ll try by screen name.

Top tweeters:

Top 15 tweeters
screenName nTweets
DockerCon 336
theCUBE 161
climbingkujira 128
BettyJunod 124
jpetazzo 124
solomonstre 107
jeanepaul 104
ManoMarks 92
OpenShiftNinja 88
sitspak 85
vmblog 79
kaslinfields 79
SFoskett 79
jameskobielus 75
bsmith626 73

And here’s a really bad visualisation of all of them!

N tweets per 5 minutes by screen name

N tweets per 5 minutes by screen name

So let’s re-do that for the top 50 tweeters.

N tweets per 5 minutes by screen name (top 50, most prolific tweeters at bottom)

N tweets per 5 minutes by screen name (top 50, most prolific tweeters at bottom)

4 About

Analysis completed in: 37.65 seconds using knitr in RStudio with R version 3.3.3 (2017-03-06) running on x86_64-apple-darwin13.4.0.

A special mention must go to twitteR (Gentry, n.d.) for the twitter API interaction functions and lubridate (Grolemund and Wickham 2011) which allows timezone manipulation without tears.

Other R packages used:

  • base R - for the basics (R Core Team 2016)
  • data.table - for fast (big) data handling (Dowle et al. 2015)
  • readr - for nice data loading (Wickham, Hester, and Francois 2016)
  • ggplot2 - for slick graphs (Wickham 2009)
  • plotly - fancy, zoomable slick graphs (Sievert et al. 2016)
  • knitr - to create this document (Xie 2016)

References

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.